1
The Fundamentals of Norm Approximation
MATH008 Lesson 6
00:00
Imagine you are a tailor trying to fit a standard suit (the range of $A$) to a client with unique proportions (the vector $b$). No matter how you adjust the sleeve or waist (the coefficients $x$), the suit will never be a perfect skin-tight match. You are looking for the "best" compromise—a norm approximation that minimizes the tension or "residual" at every seam.

The Mathematical Framework

The core objective is to find a vector $x \in \mathbb{R}^n$ such that the linear combination $Ax = x_1a_1 + \dots + x_na_n$ best approximates $b$. This is frequently referred to as the regression of $b$ onto the regressors (the columns of $A$).

We focus on the residual vector $r = Ax - b$. In practice, we assume an overdetermined system where $m > n$. Why? Because when $m = n$ and $A$ is non-singular, the optimal point is simply $A^{-1}b$, yielding zero error—a trivial case for optimization.

🎯 Core Principle
The norm approximation problem (6.1) is a convex problem and is guaranteed to be solvable. There is always at least one optimal solution $\hat{x}$ that minimizes the distance between the target and the achievable subspace.

Canonical Variations

Depending on the "flavor" of error we want to penalize, we choose different norms:

1. Least-Squares ($\ell_2$ Norm)

The most common approach. It minimizes the sum of the squares of the residuals: $\|Ax - b\|_2^2$. It is sensitive to large outliers but offers an analytical solution via the normal equations.

2. Chebyshev / Minimax ($\ell_\infty$ Norm)

Minimizes the maximum absolute residual $\max_i |r_i|$. This is used when every single measurement must stay within a strict tolerance. It can be solved via the following Linear Program (LP):

minimize $t$
subject to $-t\mathbf{1} \preceq Ax - b \preceq t\mathbf{1}$

3. Sum of Absolute Residuals ($\ell_1$ Norm)

Minimizes $\sum |r_i|$. This approach is robust to outliers, as it doesn't square the errors. It is also solvable via an LP:

minimize $\mathbf{1}^T t$
subject to $-t \preceq Ax - b \preceq t$

Estimation Context

In many engineering fields, we assume a true state $x$ is obscured by noise: $y = Ax + v$. Our goal is to find an estimate $\hat{x} = \text{argmin}_z \|Az - y\|$. By choosing the norm, we are effectively making an assumption about the statistical distribution of the noise $v$.

\text{Minimize } \|u - b\| \text{ subject to } u \in \mathcal{A} \quad (\text{where } \mathcal{A} = \text{Range}(A))